76 research outputs found

    Cascaded 3D Full-body Pose Regression from Single Depth Image at 100 FPS

    Full text link
    There are increasing real-time live applications in virtual reality, where it plays an important role in capturing and retargetting 3D human pose. But it is still challenging to estimate accurate 3D pose from consumer imaging devices such as depth camera. This paper presents a novel cascaded 3D full-body pose regression method to estimate accurate pose from a single depth image at 100 fps. The key idea is to train cascaded regressors based on Gradient Boosting algorithm from pre-recorded human motion capture database. By incorporating hierarchical kinematics model of human pose into the learning procedure, we can directly estimate accurate 3D joint angles instead of joint positions. The biggest advantage of this model is that the bone length can be preserved during the whole 3D pose estimation procedure, which leads to more effective features and higher pose estimation accuracy. Our method can be used as an initialization procedure when combining with tracking methods. We demonstrate the power of our method on a wide range of synthesized human motion data from CMU mocap database, Human3.6M dataset and real human movements data captured in real time. In our comparison against previous 3D pose estimation methods and commercial system such as Kinect 2017, we achieve the state-of-the-art accuracy

    Variational Autoencoders for Deforming 3D Mesh Models

    Full text link
    3D geometric contents are becoming increasingly popular. In this paper, we study the problem of analyzing deforming 3D meshes using deep neural networks. Deforming 3D meshes are flexible to represent 3D animation sequences as well as collections of objects of the same category, allowing diverse shapes with large-scale non-linear deformations. We propose a novel framework which we call mesh variational autoencoders (mesh VAE), to explore the probabilistic latent space of 3D surfaces. The framework is easy to train, and requires very few training examples. We also propose an extended model which allows flexibly adjusting the significance of different latent variables by altering the prior distribution. Extensive experiments demonstrate that our general framework is able to learn a reasonable representation for a collection of deformable shapes, and produce competitive results for a variety of applications, including shape generation, shape interpolation, shape space embedding and shape exploration, outperforming state-of-the-art methods.Comment: CVPR 201

    Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation

    Full text link
    3D human pose estimation has been a long-standing challenge in computer vision and graphics, where multi-view methods have significantly progressed but are limited by the tedious calibration processes. Existing multi-view methods are restricted to fixed camera pose and therefore lack generalization ability. This paper presents a novel Probabilistic Triangulation module that can be embedded in a calibrated 3D human pose estimation method, generalizing it to uncalibration scenes. The key idea is to use a probability distribution to model the camera pose and iteratively update the distribution from 2D features instead of using camera pose. Specifically, We maintain a camera pose distribution and then iteratively update this distribution by computing the posterior probability of the camera pose through Monte Carlo sampling. This way, the gradients can be directly back-propagated from the 3D pose estimation to the 2D heatmap, enabling end-to-end training. Extensive experiments on Human3.6M and CMU Panoptic demonstrate that our method outperforms other uncalibration methods and achieves comparable results with state-of-the-art calibration methods. Thus, our method achieves a trade-off between estimation accuracy and generalizability. Our code is in https://github.com/bymaths/probabilistic_triangulationComment: 9pages, 5figures, conferenc

    AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

    Full text link
    Generating 3D human motion based on textual descriptions has been a research focus in recent years. It requires the generated motion to be diverse, natural, and conform to the textual description. Due to the complex spatio-temporal nature of human motion and the difficulty in learning the cross-modal relationship between text and motion, text-driven motion generation is still a challenging problem. To address these issues, we propose \textbf{AttT2M}, a two-stage method with multi-perspective attention mechanism: \textbf{body-part attention} and \textbf{global-local motion-text attention}. The former focuses on the motion embedding perspective, which means introducing a body-part spatio-temporal encoder into VQ-VAE to learn a more expressive discrete latent space. The latter is from the cross-modal perspective, which is used to learn the sentence-level and word-level motion-text cross-modal relationship. The text-driven motion is finally generated with a generative transformer. Extensive experiments conducted on HumanML3D and KIT-ML demonstrate that our method outperforms the current state-of-the-art works in terms of qualitative and quantitative evaluation, and achieve fine-grained synthesis and action2motion. Our code is in https://github.com/ZcyMonkey/AttT2MComment: IEEE International Conference on Computer Vision 2023, 9 page

    Rigidity controllable as-rigid-as-possible shape deformations

    Get PDF
    Shape deformation is one of the fundamental techniques in geometric processing. One principle of deformation is to preserve the geometric details while distributing the necessary distortions uniformly. To achieve this, state-of-the-art techniques deform shapes in a locally as-rigid-as-possible (ARAP) manner. Existing ARAP deformation methods optimize rigid transformations in the 1-ring neighborhoods and maintain the consistency between adjacent pairs of rigid transformations by single overlapping edges. In this paper, we make one step further and propose to use larger local neighborhoods to enhance the consistency of adjacent rigid transformations. This is helpful to keep the geometric details better and distribute the distortions more uniformly. Moreover, the size of the expanded local neighborhoods provides an intuitive parameter to adjust physical stiffness. The larger the neighborhood is, the more rigid the material is. Based on these, we propose a novel rigidity controllable mesh deformation method where shape rigidity can be flexibly adjusted. The size of the local neighborhoods can be learned from datasets of deforming objects automatically or specified by the user, and may vary over the surface to simulate shapes composed of mixed materials. Various examples are provided to demonstrate the effectiveness of our method
    • …
    corecore